Model-based STFT phase recovery for audio source separation
نویسندگان
چکیده
For audio source separation applications, it is common to estimate the magnitude of the Time-Frequency (TF) representation of each source. In order to recover a time-domain signal from a spectrogram for instance, it then becomes necessary to recover the phase of the corresponding complex-valued ShortTime Fourier Transform (STFT). Most authors in this field choose a Wiener-like filtering approach which boils down to using the phase of the original mixture. In this paper, a different standpoint is adopted. Many music events are partially composed of slowly varying sinusoids and the STFT phase increment of those frequency components takes a specific form. This allows phase recovery by an unwrapping technique once a short-term frequency estimate has been obtained. Herein, a whole iterative source separation procedure is proposed which builds upon these results. It is tested on a variety of data, both synthetic and realistic, and also with different source separation scenarios, oracle or non oracle. In terms of SIR, SAR and SDR, the method achieves better performance than consistency-based approaches. To complete the experimental analysis, sound examples are provided which allow the reader to assess the interest of the method regarding the improvement of sound quality.
منابع مشابه
Time-Frequency Trade-offs for Audio Source Separation with Binary Masks
The short-time Fourier transform (STFT) provides the foundation of binary-mask based audio source separation approaches. In computing a spectrogram, the STFT window size parameterizes the trade-off between time and frequency resolution. However, it is not yet known how this parameter affects the operation of the binary mask in terms of separation quality for real-world signals such as speech or...
متن کاملSTFT based Blind Separation of Underdetermined Speech Mixtures
Analysis of non stationary signals like audio, speech and biomedical signals require good resolution both in time and frequency as their spectral components are not fixed. There are many applications of time-frequency analysis in non stationary signals like source separation, signal denoising etc. This paper presents an application of time frequency analysis using STFT, Short Time Fourier Trans...
متن کاملSTFT based Blind Separation of Underdetermined Speech Mixtures
Analysis of non stationary signals like audio, speech and biomedical signals require good resolution both in time and frequency as their spectral components are not fixed. There are many applications of time-frequency analysis in non stationary signals like source separation, signal denoising etc. This paper presents an application of time frequency analysis using STFT, Short Time Fourier Trans...
متن کاملMultichannel nonnegative matrix factorization in convolutive mixtures for audio source separation Factorisation en matrices à coefficients positifs de données multicanal convolutives pour la séparation de sources audio
We consider inference in a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. We work in the Short-Time Fourier Transform (STFT) domain, where convolution is routinely approximated as linear instantaneous mixing in each frequency band. Each source STFT is given a model inspired from nonnegativ...
متن کاملExplicit consistency constraints for STFT spectrograms and their application to phase reconstruction
As many acoustic signal processing methods, for example for source separation or noise canceling, operate in the magnitude spectrogram domain, the problem of reconstructing a perceptually good sounding signal from a modified magnitude spectrogram, and more generally to understand what makes a spectrogram consistent, is very important. In this article, we derive the constraints which a set of co...
متن کامل